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ABSTRACT 


A  posteriori  forward  eriui  analysis  is  applied  to  the  Cront  algorithm 
with  inner  product  accumulation  in  solving  system  of  linear  algebraic 
equations  of  the  type  Ax  =  b.  By  attributing  the  generated  round-off 
errors  properly  to  the  matrices  A  and  b,  it  is  shewn,  under  certain  reason¬ 
able  assumptions ,  that  the  computed  x  satisfies  a  new  perturbed  system 
such  that  (A  +  6A)x  =  b  +  6b  and  the  upper  bounds  for  6A  and  6b  in  infinite 
norm  are  shown  to  be  proportional  to  n,  the  system  order.  This  is  an 
improvement  over  the  results  where  the  inner  products  are  not  accumulated. 
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1.  Introduction. 

In  solving  system  of  linear  equations  of  the  type  Ax  =  b  where  A  is 
a  non-singular  n-tn  order  matrix  and  b  is  an  n-vector,  the  Gaussian 
elimination  method  of  decomposing  A  into  a  product  LU  of  a  lcwer  trian¬ 
gular  matrix  L  and  an  upper  triangular  matrix  U  probably  is  the  most 
generally  used  algorithm  because  its  economy  in  the  number  of  arithmetic 
operations  required  and  the  numerical  stability  of  the  solution.  For 
normalized  floating-point  computations  with  t-bits  allocated  to  the 
mantissa  of  a  floating-point  number,  we  have 

(1  +  6)  f 1 (x*y)  =  x*y,  j6|  <  2_t  =  u  (1.1) 

where  *  is  any  of  the  operators  +  ,  -,  *,  /.  Under  the  condition  of  (1.1), 
it  is  shown  [1]  that  the  computed  x  using  the  Gaussian  algorithm  satisfies 
a  new  perturbed  system  such  that 

(A  +  6A)x  =  b  +  6b  (1.2) 

and 

I  l6Al  L  i  fa2  •  1)0U» 

(1.3) 

|  |6pj  |m  <  (n2  +  n  -  1  +  ncr)pu 

where  a  and  p  are  some  constants  obtainable  after  the  computation.  Equation 
(1.1)  implies  that  for  the  given  two  nunbers  x  and  y  with  t-bits  mantissae, 
fl(x*y)  is  the  correctly  rounded  result  of  the  floating  operation  *.  It 
is  shown  that  the  operations  +  and  -  are  ill-conditioned  in  the  sense  of 
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Rice  [2]  if  |x  ±  y|  is  very  small.  In  othe-  words,  there  is  a  loss  of 
significance  if  |x  ±  y|  is  considerably  smaller  than  |x|  +  jy| .  However, 
the  relative  condition  of  x  *  y  or  x  4  y  is  a  constant  2.  Hence  for  ill- 
conditioned  systems,  the  loss  of  significance  due  to  additive  operations 
in  the  early  stages  of  computation  might  lead  to  unacceptable  final  solu¬ 
tions.  Thus  one  remedy  for  such  systems  is  the  use  of  higher-precision 
arithmetics  at  the  expense  of  more  computing  time  and  memory  space. 

Another  alternative  is  to  add  or  subtract  in  double -precis ion  whereas 
multiplication  and  division  could  still  be  done  in  single -precis ion. 
Furthermore,  the  result  of  a  single -precis ion  multiplication  can  easily 
be  retained  in  double-precision  and  used  later.  This  is  extremely  helpful 
in  the  computation  of  inner  products.  This  type  of  computation  can  thus 
be  called  "accumulated  inner  product"  arithmetic. 

The  Crout  variation  of  the  Gaussian  elimination  methods  is  essentially 
a  sequence  of  inner  product  computations.  Hence  the  use  of  accumulated 
inner  product  should  improve  the  accuracy  in  the  final  solutions.  In  this 
paper  we  will  carry  out  the  a  posteriori  forward  error  analysis  [3]  of  the 
Crout  algorithm  with  accumulated  inner  product.  The  results  show  that  the 
computed  solution  satisfies  a  perturbed  system  similar  to  (1.2)  with  bounds 
for  the  perturbations  proportional  to  n  under  certain  practical  assumptions . 

2.  Accumulated  inner  product. 

We  will  assume  that  the  given  digital  computer  will  be  able  to  perform 
the  following  operations. 
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(i)  Addition  and  subtraction.  The  machine  accents  numbers  in 
double -precision  mantissa  and  produces  a  result  having  a  double-precision 
mantissa. 

(ii)  Division.  The  machine  accepts  a  double -precis ion  dividend 
and  a  single  precision  divisor,  giving  a  single-precision  quotient. 

(iii)  Multiplication.  The  machine  accepts  single-precision  factors 
and  gives  a  double-precision  product. 

Furthermore,  if  a  single -precis ion  number  has  a  t-bit  mantissa,  then 
a  double-precision  number  will  have  2  t-bits  for  the  mantissa.  Extending 
(1.1)  to  (i),  (ii)  and  (iii),  we  have  the  following  lemma: 

Lenina  2.1.  If  a,  b  are  single -precis ion  numbers  and  x,  y  are  double¬ 
precision  numbers,  then  we  have 


(1 + 

6) 

fl(x  +  y) 

=  x  +  y. 

l«l 

<  2'2t  -  u2; 

(2.1) 

(1  ♦ 

6’) 

fl(x  -  y) 

II 

X 

1 

'C 

>• 

l«l 

<  2~2t  =  u2; 

(2.2) 

fl(ab) 

=  ab  ; 

(2.3) 

(1  + 

A) 

f 1 (x/a) 

*  x/a  , 

1 A  | 

<  2't  =  u  . 

(2.4) 

We  see  that  the  results  of  the  operations  +,  -,  and  /  are  the  correctly 
rounded  results  and  the  operation  multiplication  is  exact. 

We  can  now  consider  the  computation  of  the  following  general  inner 
product  by  accumulation: 


n 

p  *  fl 

1  a.b. 

U=i  “l 

/y 

(2.5) 
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execution  of  (2.5)  can  be  carried  out  by  the  following  recursive 
sec  lence : 


S1  =  fl  Ca-j^b-, ) , 

sk+l  =  £1(>k  +  VlW*  1  <  k  <  n-1, 

P  =  fl(sn/y). 

Applying  Lemma  2.1  to  (2.6),  we  have 
S1  =  albi* 

(1  *  Wvi  ■  sk  *  viVi-  1 W  i  “2. 

(1  +  A)p  =  sn/y,  | A 1  <  u 

Combining  (2.7)  for  k  =  1,  2,  ••*,  n-1,  we  have 


p  +  e  = 


i=l 


aibi 


/y 


1  £  k  <_  n-1. 


(2.6) 


(2.7) 


(2.8) 


where 


i  n*1 
=  pA  +  -  l  <$ 
Y  k=l 


k+lsk+l 


(2.9) 


We  note  that  if  y  -  1  the  last  step  in  (2.6)  is  actually  a  double -precis ion 
to  single -precis ion  conversion,  hence  the  last  equation  in  (2.7)  is  still 
valid.  To  bound  the  error  in  (2.8),  let  us  denote  by  a  the  magnitude  of 
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the  absolute  maximum  of  the  computed  nunbers  in  (2.6),  namely. 


o  -  max  (|p| ,  |sj). 
2<k<n  K 


(2.10) 


Then  the  upper  bound  for  e  in  (2.8)  is 
|e|  1  [i  +  j^p(n  -  l)u)ou 
For  | y |  =  1,  then  we  have  a  simplified  equation 
| e |  <_  [1  +  (n  -  l)u]cu 

Thus  we  have  established  the  following  lemma: 

Lemma  2.2.  The  accumulated  inner  product  of  (2.9)  satisfies 


(2.11) 


(2.12) 


p  +  e  = 


n 


7  a-b. 

i-1  1  1 


(2.13) 


where 

| e |  <  [1  +  rrr(n  -  l)u]ou  (2.14) 

“  1/ 1 

and  o  is  defined  in  (2.10). 

From  (2.14)  we  see  the  round-off  error  in  the  accumulated  inner  pro¬ 
duct  is  very  small  if  n  is  not  too  large  and  |y|  is  not  too  small.  This 
is  of  course  what  we  have  expected  in  using  double -precis ion  additive 
operations . 
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!.  The  Crout  algorithm  with  accumulated  inner  product. 

The  usual  Gaussian  elimination  method  allows  us  to  obtain  the  elements 
of  the  matrices  L  and  U  by  a  sequence  of  eliminations  of  the  variables. 

On  the  other  hand,  the  Crout  algorithm  determines  the  L  and  U  directly 
from  the  matrix  equation  LU  =  A.  Specifically,  if  L  is  a  unit-diagonal 
lower  triangular  matrix,  then  we  have 


£  £ 

31  32 


“ll  u12  u]3  '  •  •  “in' 

U22  U23  u2n 

U33  u3n 


1 


£  .  £  ,  a.  _ 

nl  n2  no 


•  • 


• 

•  • 

1 

u 

J  . 

nnj 

all 

a12 

a13  ‘ 

*  * 

In 

a21 

a22 

a23  ' 

•  •  a,, 

2n 

a31 

a32 

a33  * 

•  •  a, 

3n 

• 

• 

• 

• 

•  • 

• 

• 

•  • 

(3.1)  ! 

>1 

• 

an2 

• 

an3  * 

•  • 

•  •  a 

nn 

1 

If  we  write  out  (3.1)  in  full,  we  see  that  the  first  row  of  U  is  given 

by  the  equation  u^  =  a^,  u^  =  a^,  •**,  u^n  =  a^n  and  the  first  column 

of  L  may  then  be  obtained  from  the  equations  ^^ll  =  a2i»  ^3iun  =  a3i» 

t  ,u„  =  a  We  can  then  solve  for  the  second  rcw  of  U  and  the 
’  nl  11  nl 
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second  column  of  L  and  so  on.  The  corrputational  equations  which  give  u 
and  l. .  in  terms  of  previously  computed  quantities  are 


" 

i-1 

u.  .  -  fl 
1J 

aij  *  ^ik^j 

/I 

fl 


> "  Ji  ^ 


/u.  • 


j  >  i  >  1 


(3.3) 


Thus  the  determination  of  u^  and  can  be  carried  out  by  confuting 
a  corresponding  accumulated  inner  product.  The  partial  pivoting  strategy 
can  still  be  employed  here  if  the  sequence  of  (3.2)  and  (3.3)  are  slightly 
altered  to  allow  a  search  for  the  largest  element  in  a  column  as  pivot. 
This  is  described  in  Wilkinson  [4] .  We  will  assume  that  the  row  inter¬ 
changes  has  been  done  in  advance  so  that  no  pivoting  is  necessary  and  the 
elements  of  L  are  all  of  magnitudes  less  than  or  equal  to  one. 

Applying  Lenina  2.2  to  (3.2)  and  (3.3),  we  have 


i-1 


u. .  +  e. .  =  a 
ij 


0  .  .  +  G  .  .  =  — 1 

ji  Ji  u 


li 


i-1 


"ji  ’  Ji  £jkUki 


j  >  i  1  1 


(3.4) 

(3.5) 


where 

ley  |  <  [1  ♦  (i  -  Dujoyu,  j  >  i  >  1 


(3.6) 
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and 


! e -- 1  <  [1  +  tjj— r(i  -  l)u]a..u,  j  >  i  >_  1. 
J  ii1  J 


Now  equations  (3.4)  and  (3.5)  can  also  be  written  as 


i-1 


uij  *  J/ik'Vj  *  eij  ■  V  ii1’1- 


i-1 


+  J  L.u.  .  +  e-.u..  =  a...  j  >  i  >  1. 

ji  ii  jk  ki  ji  ii  ji*  J  — 


Combining  (3.8)  and  (3.9)  in  matrix  notation,  we  have 


LU  +  F  *  A 


where  F  =  (fy)  and 

l£ijl  -  le^  |  <  [1  +  (i  -  l)u]a.jU,  j  >  i  >  1, 

I f j i I  =  l£jiuiil  £  C I uii I  +  (i  -  Dula^u,  j  >  i 
Now  let  us  define 


|F|  -  Clfyl). 

p  *  max  |  in  i  | , 
a  =  max|a. . | . 

i.j  J 

Then  we  have 


(3.7) 

(3.8) 

(3.9) 

(3.10) 

(3.11) 
>  1.  (3.12) 

(3.13) 
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|  F  |  <_  ou 


0 

1+u 

p+u 


0 

1+u 

l+2u 


•  • 


0 

1+u 

l+2u 


Lp  p+u  p+2u  •  •  p+(n-2)u  l+(n-l)uj 


(3.14) 


The  upper  bound  of  the  infinite  norm  of  F  can  thus  be  estimated  as 
[n  +  u]ou,  for  p  <  1; 


l|F||  <  { 


(3.15) 


[  (n- 1) p  +  1  +  — ujcru,  for  p  >  1. 


Note  that  in  (3.15)  p  and  a  are  not  necessarily  equal  unless  column  inter¬ 
changes  are  done  in  advance  to  assure  that  a  =  p. 

Thus  we  have  established  the  following  lemma: 

Lemma  3.1.  The  Crout  algorithm  of  directly  decomposing  A  into  a  pro¬ 
duct  LU  by  using  accumulated  inner  product  gives  us  triangular  matrices 
L  and  U  such  that 


LU  +  F  -  A  (3.16) 

where  the  upper  bound  for  F  can  be  estimated  using  (3.15). 

From  (3.15)  we  see  that  the  dominating  factor  in  the  error  bounds  is 
tkxu  or  (n-l)pou  since  usually  we  have  u  «  1  for  most  of  the  existing 

general  purpose  machines.  For  example,  for  the  IEM  360  series,  we  have 
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t  =  24  and  hence  2  ^  is  approximately  equal  to  1  if  n  *  2900 

which  is  far  more  than  the  system  order  we  encounter  in  practice.  Thus  we 
could  ignore  this  small  term  and  the  actual  upper  bounds  for  F  is  approxi¬ 
mately  proportional  to  the  system  order  n. 

Nor  we  can  solve  the  decomposed  system 

LUx  =  b  (3.17) 

in  the  sequence 

Ly  =  b  (3.18) 


and 


Ux  =  y. 


(3.19) 


Again  accumulated  inner  product  is  used  to  solve  for  y  and  x  by  the  com 
putational  equations 


y  i  •  bi 


n  ■ fl 


r  i-i  ) 

bi  -  J,  V:  n 


,  2  £  i  £  n 


and 


fl 


k-i 

[*  ■  Vj 

1  <  k  <  n 


(3.20) 


(3.21) 


> 


j 


1 
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We  can  similarly  apply  Lemma  2.2  to  (3.20)  and  (3.21).  The  results 
are  summarized  in  the  following  lemma: 

Lemma  3.2.  The  computed  solutions  y  and  x  of  the  triangular  systems 
(3.18)  and  (3.19)  by  the  use  of  accumulated  inner  product  satisfy 

Ly  +  e  =  b  (3.22) 

Ux  +  e  =  y  (3.23) 


where  the  absolute  vectors  of  e  and  e  satisfy 


e|  <  abu 


1 

lm 

l+2u 


[l+(n-l)uj 


(3.24) 


|  e |  <  ayu 


p+(n-l)u 

p+(n-2)u 


(3.25) 


L  p+(o)u  J 

and  ob  or  o  are  the  magnitude  of  the  absolute  maximum  value  generated 
during  the  amputation  of  x  or  y  respectively.  Combining  Lemma  3.1  and 
Lenina  3.2,  we  have  the  following  theorem: 

Theorem  3.1.  The  solution  x  computed  by  the  Crout  algorithm  with 
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accumulated  inner  product  satisfies 


(A  +  6A)x  =  b  +  6b  (3.26) 

where  6A  =  -F  and  6b  =  -e  -  Le.  Furthermore, 

IIMIL-Ilrll.  (3-27) 

Il'iMLi  Ikll.  *  I  ML  (3-2« 

where 

| |e| |ot  <  abu[l  >  (n  -  l)u]  (3.29) 

||Le!L  <  oyu[np  +  u]  (3.30) 


Thus  we  see  if  the  assumption  that  n^'  — ^  u  «  1  is  true,  then  the 
computed  solution  satisfied  a  perturbed  system  of  (3.26)  with  upper  bounds 
lor  the  perturbations  6A  and  6b  proportional  to  n.  Hence  in  solving  higher 
order  system  of  linear  algebraic  equations ,  the  Crout  algorithm  with 
accumulated  inner  product  should  be  used  to  avoid  loss  of  significance  at 
all  stages  of  computation.  This  is  especially  important  for  ill-conditioned 
systems  where  rows  or  columns  are  usually  more  or  less  dependent. 

4.  Numerical  experiment . 

To  see  how  the  accumulation  of  inner  product  affects  the  solution 

accuracy,  we  have  solved  a  5  by  5  matrix  problem  of  the  type  Ax  =  b  where 

T 

A  is  a  5-th  order  inverse  Hilbert  matrix  and  b  =  (1,  0,  0,  0,  0)  .  Hence 
the  exact  solution  is  x  =  (1,  1/2,  1/3,  1/4,  1/5).  \n  arbitrary  precision 
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arithmetic  unit  [5]  is  used  to  simulate  a  24 -digits  mantissa  chopped 
floating-point  arithmetic  system.  The  results  with  and  without  accumula¬ 
tion  of  inner  products  are  listed  in  the  following  table: 


Without  Accumulation 


Xj  =  0.1000 

0000 

0000 

0000 

0000 

8947 

(10’) 

=  0.5000 

0000 

0000 

0000 

0006 

6930 

(10°) 

x^  =  0.3333 

3333 

3333 

3333 

3338 

6805 

(10°) 

x^  =  0.2500 

0000 

0000 

0000 

0004 

4524 

(10°) 

Xj.  =  0.2000 

0000 

0000 

0000 

0003 

8143 

(10°) 

With  Accumulation 

x .  =  0.1000 

0000 

0000 

0000 

0000 

0020 

(10’) 

X2  =  0.5000 

0000 

0000 

0000 

0000 

0177 

(10°) 

x3  =  0.3333 

3333 

3333 

3333 

3333 

3488 

(10°) 

x4  =  0.2500 

0000 

0000 

0000 

0000 

0135 

(10°) 

x5  =  0.2000 

0000 

0000 

0000 

0000 

0117 

(10°) 

Table  4.1.  Numerical  Results  of  Solving  Ax  =  b. 

We  see  from  Table  4.1  that  two  more  significant  digits  are  obtained  in 
all  of  the  solution  components  when  accumulation  of  inner  products  is  used 
in  the  Crout  algorithm.  The  absolute  error  in  each  component  is  decreased 
by  a  factor  of  300  to  400  with  accumulation. 


5.  Conclusions. 

We  have  shown,  by  the  a  posteriori  error  analysis,  that  the  computed 
results  of  the  Crout  algorithm  with  inner  product  accumulation  satisfy  a 
perturbed  system  and  the  upper  bounds  for  the  perturbations  are  proportional 
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to  the  system  order  n  under  certain  practical  assumptions .  The  improve¬ 
ment  in  accuracy  is  basically  due  to  the  effort  to  avoid  loss  of  signifi¬ 
cance  in  additive  operations.  This  is  confirmed  by  the  results  of  our 
numerical  experiment.  Indeed  the  inner  product  accumulation  should  be 
done  in  every  computation  whenever  it  is  possible. 

We  should  also  note  that  the  Crout  algorithm  is  no  more  than  an 
"analytic"  process  where  first  the  matrix  A  is  decomposed  into  factors  L 
and  U  and  later  on  the  vector  b  is  decomposed  into  L  and  y  and  subsequently 
y  is  decomposed  into  U  and  the  desired  x.  Hence  our  a  posteriori  analysis 
can  only  give  us  bounds  of  the  difference  between  the  computed  decomposi¬ 
tion  LU  and  the  exact  decomposition  A  or  the  difference  between  the  computed 
decomposition  LUx  and  the  exact  decomposition  b.  In  order  to  find  the 
difference  between  the  computed  x  and  the  exact  solution  A  *b  we  need  the 
information  of  A1  which  is  of  course  unavailable  unless  the  decomposition 
is  also  used  to  obtain  an  approximate  inverse  of  A. 
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